Quantitative Evaluation of Performance and Validity Indices for Clustering the Web Navigational Sessions

نویسندگان

  • Zahid Ahmed Ansari
  • M. F. Azeem
  • Waseem Ahmed
  • A. Vinaya Babu
چکیده

Clustering techniques are widely used in “Web Usage Mining” to capture similar interests and trends among users accessing a Web site. For this purpose, web access logs generated at a particular web site are preprocessed to discover the user navigational sessions. Clustering techniques are then applied to group the user session data into user session clusters, where intercluster similarities are minimized while the intra cluster similarities are maximized. Since the application of different clustering algorithms generally results in different sets of cluster formation, it is important to evaluate the performance of these methods in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. This paper describes various validity and accuracy measures including Dunn’s Index, Davies Bouldin Index, C Index, Rand Index, Jaccard Index, Silhouette Index, Fowlkes Mallows and Sum of the Squared Error (SSE). We conducted the performance evaluation of the following clustering techniques: k-Means, k-Medoids, Leader, Single Link Agglomerative Hierarchical and DBSCAN. These techniques are implemented and tested against the Web user navigational data. Finally their performance results are presented and compared. Keywords-component; web usage mining; clustering techniques, cluster validity indices, performance evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy Approach for Feature Evaluation and Dimensionality Reduction to Improve the Quality of Web Usage Mining Results

The explosive growth in the information available on the Web has necessitated the need for developing Web personalization systems that understand user preferences to dynamically serve customized content to individual users. Web server access logs contain substantial data about the accesses of users to a Web site. Hence, if properly exploited, the log data can reveal useful information about the...

متن کامل

Model-Based Cluster Analysis for Web Users Sessions

One of the main issues in Web usage mining is the discovery of patterns in the navigational behavior of Web users. Standard approaches, such as clustering of users’ sessions and discovering association rules or frequent navigational paths, do not generally allow to characterize or quantify the unobservable factors that lead to common navigational patterns. Therefore, it is necessary to develop ...

متن کامل

A density based clustering approach to distinguish between web robot and human requests to a web server

Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...

متن کامل

Developing and Psychometrics "Scale of Performance Evaluation Criteria for University Faculty Members"

Introduction: Evaluation of university teachers is a necessity in order to improve the knowledge of the community. In this regard, the performance evaluation instruments of university faculty members are different. Therefore, the aim of present study was to develop and Psychometrics "Scale of Performance Evaluation Criteria for University Faculty Members". Methods: The present study is of metho...

متن کامل

انتخاب اعضای ترکیب در خوشه‌بندی ترکیبی با استفاده از رأی‌گیری

Clustering is the process of division of a dataset into subsets that are called clusters, so that objects within a cluster are similar to each other and different from objects of the other clusters. So far, a lot of algorithms in different approaches have been created for the clustering. An effective choice (can combine) two or more of these algorithms for solving the clustering problem. Ensemb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1507.03340  شماره 

صفحات  -

تاریخ انتشار 2011